{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 利用Attention机制来加强我们的翻译效果\n",
    "\n",
    "上次的训练结果，大家也看到了 [使用Encoder-Decoder来完成机器翻译](https://github.com/LianHaiMiao/pytorch-lesson-zh/blob/master/NLP/encode_decoder.ipynb) ，很不幸，我们训练出了一个智障，那么有没有什么办法，可以提高它的“智商”呢？让它的翻译效果稍微提升一丢丢呢？\n",
    "\n",
    "有的！ 这就是 **attention 机制**。\n",
    "\n",
    "在上一篇中，我们的 decoder 在各个时刻使用了相同的背景向量。但是，如果解码器可以在不同时刻使用不同的背景向量呢，效果会不会更好呢？\n",
    "\n",
    "以 英语-中文 为例子，给定一个输入序列 \"I Love You\" 和输出序列 \"我爱你\" ，解码器在 t1 时刻可以使用更多编码了 “I” 的信息去解码生成 \"我\" ，在 t2 时刻可以使用更多编码了 \"Love\" 的信息去解码生成 \"爱\"。这听起来就像是解码器在不同的时刻对输入的数据有着不同的 “注意力” 这也就是注意力机制 (attention) 的由来。\n",
    "\n",
    "此时，相比于前一章节的模型，我们只需要更改 Decoder 部分的代码。\n",
    "\n",
    "此时 Decoder 模型的示意图是：\n",
    "\n",
    "\n",
    "![Decoder with attention](./images/attention-decoder-network.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "步骤基本跟前面的 encoder-decoder 类似，仅仅需要少量的改动"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第一步：构建一个Config类，用于保存各种超参数，以及导入各种包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "from torch.autograd import Variable\n",
    "from torch import optim\n",
    "import torch.nn.functional as F\n",
    "import unicodedata, string, re, random, time, math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class Config():\n",
    "    def __init__(self):\n",
    "        self.data_path = \"../data/cmn-eng/cmn.txt\" # 数据放在 /data 目录下\n",
    "        self.use_gpu = True\n",
    "        self.hidden_size = 128\n",
    "        self.encoder_lr = 5*1e-4\n",
    "        self.decoder_lr = 5*1e-4\n",
    "        self.train_num = 150000 # 训练数据集的数目\n",
    "        self.print_epoch = 10000\n",
    "        self.MAX_Len = 15\n",
    "config = Config()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第二步：数据预处理\n",
    "\n",
    "准备数据的全部过程如下所示：\n",
    "\n",
    "1. 读取txt文件，并按行分割，再把每一行分割成一个pair (Eng, Chinese)\n",
    "2. 过滤并处理文本信息\n",
    "3. 从每个pair中，制作出 中文词典 和 英文词典\n",
    "4. 构建训练集\n",
    "\n",
    "data下载地址为： http://www.manythings.org/anki/cmn-eng.zip\n",
    "\n",
    "该数据集中还有其他类型的翻译数据 http://www.manythings.org/anki/\n",
    "\n",
    "\n",
    "——————————————————————\n",
    "\n",
    "**这里需要注意，当我们下载完成之后，我们要把数据放在主目录下的 /data 文件夹下**\n",
    "\n",
    "格式：/data/cmn-eng/cmn.txt\n",
    "\n",
    "\n",
    "——————————————————————\n",
    "\n",
    "\n",
    "中文词典和英文词典，我们使用*Lang* 类，该类包含了所有的 中文（英文） -> 数字 或者 数字 -> 中文（英文）的映射。\n",
    "\n",
    "同时，我们要给一句话的其实和结束加上标志符\n",
    "\n",
    "起始符：(Start Of Sentence)\n",
    "\n",
    "SOS_token = 0\n",
    "\n",
    "结束符：(End Of Sentence)\n",
    "\n",
    "EOS_token = 1\n",
    "\n",
    "另外，在这个类中，我们需要添加一个 *word2count* 方法，用来计算各个词出现的次数\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "SOS_token = 0\n",
    "EOS_token = 1\n",
    "\n",
    "class Lang():\n",
    "    def __init__(self, name):\n",
    "        self.name = name\n",
    "        self.word2index = {}\n",
    "        self.index2word = {0: \"SOS\", 1: \"EOS\"}\n",
    "        self.word2count = {}\n",
    "        self.n_words = 2  # Count SOS and EOS\n",
    "    \n",
    "    def addSentence(self, sentence):\n",
    "        if self.name == \"Chinese\":\n",
    "            for word in sentence:\n",
    "                self.addWord(word)\n",
    "        else:\n",
    "            for word in sentence.split(' '):\n",
    "                self.addWord(word)\n",
    "    \n",
    "    def addWord(self, word):\n",
    "        if word not in self.word2index:\n",
    "            self.word2index[word] = self.n_words\n",
    "            self.word2count[word] = 1\n",
    "            self.index2word[self.n_words] = word\n",
    "            self.n_words += 1\n",
    "        else:\n",
    "            self.word2count[word] += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def unicodeToAscii(s):\n",
    "    return ''.join(\n",
    "        c for c in unicodedata.normalize('NFD', s)\n",
    "        if unicodedata.category(c) != 'Mn'\n",
    "    )\n",
    "\n",
    "# Lowercase, trim, and remove non-letter characters\n",
    "def normalizeString(s):\n",
    "    s = unicodeToAscii(s.lower().strip())\n",
    "    s = re.sub(r\"([.!?])\", r\" \\1\", s)\n",
    "    s = re.sub(r\"[^a-zA-Z.!?]+\", r\" \", s)\n",
    "    return s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def readLangs(lang1, lang2, pairs_file, reverse=False):\n",
    "    print(\"Reading lines...\")\n",
    "\n",
    "    # Read the file and split into lines\n",
    "    lines = open(pairs_file, encoding='utf-8').read().strip().split('\\n')\n",
    "    # Split every line into pairs and normalize\n",
    "    pairs = []\n",
    "    for l in lines:\n",
    "        temp = l.split('\\t')\n",
    "        eng_unit = normalizeString(temp[0])\n",
    "        chinese_unit = temp[1]\n",
    "        pairs.append([eng_unit, chinese_unit])\n",
    "    \n",
    "    # Reverse pairs, make Lang instances\n",
    "    if reverse:\n",
    "        pairs = [list(reversed(p)) for p in pairs]\n",
    "        input_lang = Lang(lang2)\n",
    "        output_lang = Lang(lang1)\n",
    "    else:\n",
    "        input_lang = Lang(lang1)\n",
    "        output_lang = Lang(lang2)\n",
    "        \n",
    "    return input_lang, output_lang, pairs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "MAX_LENGTH = config.MAX_Len  # 长度大于15的我们统统舍弃\n",
    "\n",
    "eng_prefixes = (\n",
    "    \"i am \", \"i m \",\n",
    "    \"he is\", \"he s \",\n",
    "    \"she is\", \"she s\",\n",
    "    \"you are\", \"you re \",\n",
    "    \"we are\", \"we re \",\n",
    "    \"they are\", \"they re \",\n",
    "    \"i\", \"he\", 'you', 'she', 'we',\n",
    "    'they', 'it'\n",
    ")\n",
    "\n",
    "def filterPair(p):\n",
    "    return len(p[0].split(' ')) < MAX_LENGTH and \\\n",
    "        len(p[1]) < MAX_LENGTH and \\\n",
    "        p[0].startswith(eng_prefixes)\n",
    "\n",
    "def filterPairs(pairs):\n",
    "    return [pair for pair in pairs if filterPair(pair)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reading lines...\n",
      "Read 19777 sentence pairs\n",
      "Trimmed to 9473 sentence pairs\n",
      "Counting words...\n",
      "Counted words:\n",
      "Eng 字典的大小为 3737\n",
      "Chinese 字典的大小为 2638\n",
      "['it s a nice day .', '今天天氣很好。']\n"
     ]
    }
   ],
   "source": [
    "def prepareData(lang1, lang2, pairs_file, reverse=False):\n",
    "    input_lang, output_lang, pairs = readLangs(lang1, lang2, pairs_file, reverse)\n",
    "    print(\"Read %s sentence pairs\" % len(pairs))\n",
    "    pairs = filterPairs(pairs)\n",
    "    print(\"Trimmed to %s sentence pairs\" % len(pairs))\n",
    "    print(\"Counting words...\")\n",
    "    for pair in pairs:\n",
    "        input_lang.addSentence(pair[0])\n",
    "        output_lang.addSentence(pair[1])\n",
    "    print(\"Counted words:\")\n",
    "    print(input_lang.name, \"字典的大小为\", str(input_lang.n_words))\n",
    "    print(output_lang.name, \"字典的大小为\", str(output_lang.n_words))\n",
    "    return input_lang, output_lang, pairs\n",
    "\n",
    "input_lang, output_lang, pairs = prepareData('Eng', 'Chinese', config.data_path)\n",
    "print(random.choice(pairs))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**到目前为止，我们已经把字典构建好了，接下来就是构建训练集**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def indexesFromSentence(lang, sentence):\n",
    "    if lang.name == \"Chinese\":\n",
    "        return [lang.word2index[word] for word in sentence]\n",
    "    else:\n",
    "        return [lang.word2index[word] for word in sentence.split(' ')]\n",
    "\n",
    "def variableFromSentence(lang, sentence, use_gpu):\n",
    "    indexes = indexesFromSentence(lang, sentence)\n",
    "    indexes.append(EOS_token)\n",
    "    result = Variable(torch.LongTensor(indexes).view(-1, 1)) # seq*1\n",
    "    if use_gpu:\n",
    "        return result.cuda()\n",
    "    else:\n",
    "        return result\n",
    "\n",
    "def variablesFromPair(pair, use_gpu):\n",
    "    input_variable = variableFromSentence(input_lang, pair[0], use_gpu)\n",
    "    target_variable = variableFromSentence(output_lang, pair[1], use_gpu)\n",
    "    return (input_variable, target_variable)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(Variable containing:\n",
      "    4\n",
      "  444\n",
      "   70\n",
      "  375\n",
      "  151\n",
      " 1040\n",
      "  695\n",
      "    6\n",
      "    1\n",
      "[torch.cuda.LongTensor of size 9x1 (GPU 0)]\n",
      ", Variable containing:\n",
      "  972\n",
      "  973\n",
      "  254\n",
      "  102\n",
      "   31\n",
      "    6\n",
      "   65\n",
      "  563\n",
      "  563\n",
      "    4\n",
      "    1\n",
      "[torch.cuda.LongTensor of size 11x1 (GPU 0)]\n",
      "), (Variable containing:\n",
      "    4\n",
      "   43\n",
      "  186\n",
      "   40\n",
      "  520\n",
      "    6\n",
      "    1\n",
      "[torch.cuda.LongTensor of size 7x1 (GPU 0)]\n",
      ", Variable containing:\n",
      "    6\n",
      "   67\n",
      "   31\n",
      "  691\n",
      "  692\n",
      "    4\n",
      "    1\n",
      "[torch.cuda.LongTensor of size 7x1 (GPU 0)]\n",
      ")]\n"
     ]
    }
   ],
   "source": [
    "# 随机获取2个训练数据集， 这里我们依旧不用进行 batch 处理，下一章节 attention 机制中，我们再进行 batch 处理\n",
    "example_pairs = [variablesFromPair(random.choice(pairs), config.use_gpu)\n",
    "                      for i in range(2)]\n",
    "print(example_pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第三步：构建编码器\n",
    "\n",
    "编码器的结构，如图所示：\n",
    "\n",
    "\n",
    "![encoder-network](./images/encoder-network.png)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class Encoder(nn.Module):\n",
    "    def __init__(self, input_size, hidden_size):\n",
    "        super(Encoder, self).__init__()\n",
    "        self.hidden_size = hidden_size\n",
    "        self.embedding = nn.Embedding(input_size, hidden_size)\n",
    "        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True)\n",
    "    \n",
    "    def forward(self, x, hidden):\n",
    "        embedded = self.embedding(x).view(1, x.size()[0], -1)\n",
    "        output = embedded  # batch*seq*feature\n",
    "        output, hidden = self.gru(output, hidden)\n",
    "        return output, hidden\n",
    "    \n",
    "    def initHidden(self, use_gpu):\n",
    "        result = Variable(torch.zeros(1, 1, self.hidden_size))\n",
    "        if use_gpu:\n",
    "            return result.cuda()\n",
    "        else:\n",
    "            return result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第四步：构建解码器\n",
    "\n",
    "编码器的结构，如图所示：\n",
    "\n",
    "![Decoder with attention](./images/attention-decoder-network.png)\n",
    "\n",
    "\n",
    "**todo_list: 图片中的模型，跟我们这里构建的模型有差异。以后记得补上我们这里的模型图。**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class AttentionDecoder(nn.Module):\n",
    "    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):\n",
    "        super(AttentionDecoder, self).__init__()\n",
    "        self.hidden_size = hidden_size\n",
    "        self.dropout_p = dropout_p\n",
    "        self.max_length = max_length\n",
    "        \n",
    "        self.embedding = nn.Embedding(output_size, hidden_size)\n",
    "        \n",
    "        # attention 机制\n",
    "        self.attn = nn.Sequential(\n",
    "            nn.Linear(self.hidden_size * 2, self.max_length),\n",
    "            nn.Tanh(),\n",
    "            nn.Linear(self.max_length, 1)\n",
    "        )\n",
    "        \n",
    "        # 结合之后的值\n",
    "        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)\n",
    "        \n",
    "        # drop out 防止过拟合\n",
    "        self.dropout = nn.Dropout(self.dropout_p)\n",
    "        \n",
    "        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True)\n",
    "        self.out = nn.Linear(hidden_size, output_size)\n",
    "        self.softmax = nn.LogSoftmax()\n",
    "\n",
    "    def forward(self, x, hidden, encoder_outputs):\n",
    "        \"\"\"\n",
    "            x: 1*1\n",
    "            hidden: 1*1*embed_size\n",
    "            encoder_outputs: 1*seq_len*embed_size\n",
    "        \"\"\"\n",
    "        cur_input_data = self.embedding(x).view(1, 1, -1) # 1*1*embed_size\n",
    "        \n",
    "        cur_seq_len = encoder_outputs.size()[1]\n",
    "        hidden_broadcast = hidden.expand(1, cur_seq_len, self.hidden_size)\n",
    "        \n",
    "        # concate 操作根据 hidden 和 encoder_outputs 来求出当前context环境中的权重\n",
    "        encoder_outputs_and_hiddens = torch.cat((encoder_outputs, hidden_broadcast), dim=2)\n",
    "\n",
    "        # 计算 attention weights\n",
    "        attn_weights = F.softmax(\n",
    "            self.attn(encoder_outputs_and_hiddens)) # size: 1 * seq_len * 1\n",
    "        \n",
    "        decoder_context = torch.bmm(attn_weights.view(1, 1, -1), encoder_outputs) # size: 1*1*embed_size\n",
    "        \n",
    "        # 把 context 和 input 结合起来\n",
    "        input_and_context = torch.cat((cur_input_data, decoder_context), dim=2) # size: 1*1*(embed_size+embed_size)\n",
    "        \n",
    "        concat_input = self.attn_combine(input_and_context) # size: 1*1*embed_size\n",
    "      \n",
    "        output, hidden = self.gru(concat_input, hidden)\n",
    "        output = self.softmax(self.out(output[0]))\n",
    "        return output, hidden, attn_weights\n",
    "\n",
    "    def initHidden(self, use_gpu):\n",
    "        result = Variable(torch.zeros(1, 1, self.hidden_size))\n",
    "        if use_gpu:\n",
    "            return result.cuda()\n",
    "        else:\n",
    "            return result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第五步：开始训练\n",
    "\n",
    "定义优化器、损失函数，然后开始进行训练\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 实例化模型\n",
    "\n",
    "encoder = Encoder(input_lang.n_words, config.hidden_size)\n",
    "encoder = encoder.cuda() if config.use_gpu else encoder\n",
    "\n",
    "attention_decoder = AttentionDecoder(config.hidden_size, input_lang.n_words)\n",
    "attention_decoder = attention_decoder.cuda() if config.use_gpu else attention_decoder\n",
    "\n",
    "# 定义优化器\n",
    "\n",
    "encoder_optimizer = optim.Adam(encoder.parameters(), lr=config.encoder_lr)\n",
    "\n",
    "decoder_optimizer = optim.Adam(attention_decoder.parameters(), lr=config.decoder_lr)\n",
    "\n",
    "\n",
    "# 定义损失函数\n",
    "\n",
    "fn_loss = nn.NLLLoss()\n",
    "\n",
    "training_pairs = [variablesFromPair(random.choice(pairs), config.use_gpu)\n",
    "                      for i in range(config.train_num)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loss is: 3.4028\n",
      "loss is: 3.5155\n",
      "loss is: 3.7690\n",
      "loss is: 1.1878\n",
      "loss is: 2.0838\n",
      "loss is: 2.8674\n",
      "loss is: 3.9982\n",
      "loss is: 1.7657\n",
      "loss is: 2.5525\n",
      "loss is: 0.6789\n",
      "loss is: 2.1010\n",
      "loss is: 0.9873\n",
      "loss is: 1.4932\n",
      "loss is: 1.9774\n",
      "loss is: 0.3823\n"
     ]
    }
   ],
   "source": [
    "# 开始训练\n",
    "for iter in range(1, config.train_num+1):\n",
    "    training_pair = training_pairs[iter - 1]\n",
    "    input_variable = training_pair[0]  # seq_len * 1\n",
    "    target_variable = training_pair[1]  # seq_len * 1\n",
    "    \n",
    "    loss = 0\n",
    "    \n",
    "    # 因为有 dropout, 所以我们需要加上 train()\n",
    "    encoder.train()\n",
    "    attention_decoder.train()\n",
    "    \n",
    "    # 训练过程\n",
    "    encoder_hidden = encoder.initHidden(config.use_gpu)\n",
    "    encoder_optimizer.zero_grad()\n",
    "    decoder_optimizer.zero_grad()\n",
    "\n",
    "    input_length = input_variable.size()[0]\n",
    "    target_length = target_variable.size()[0]\n",
    "    \n",
    "    # 传入 encoder\n",
    "    encoder_output, encoder_hidden = encoder(input_variable, encoder_hidden)\n",
    "    \n",
    "    # decoder 起始\n",
    "    decoder_input = Variable(torch.LongTensor([[SOS_token]]))\n",
    "    decoder_input = decoder_input.cuda() if config.use_gpu else decoder_input\n",
    "    \n",
    "    decoder_hidden = encoder_hidden\n",
    "    \n",
    "    for di in range(target_length):\n",
    "        decoder_output, decoder_hidden, decoder_attention = attention_decoder(decoder_input, decoder_hidden, encoder_output)\n",
    "        targ = target_variable[di]\n",
    "        loss += fn_loss(decoder_output, targ)\n",
    "        decoder_input = targ\n",
    "    \n",
    "    # 反向求导\n",
    "    loss.backward()\n",
    "    # 更新梯度\n",
    "    encoder_optimizer.step()\n",
    "    decoder_optimizer.step()\n",
    "    \n",
    "    print_loss = loss.data[0] / target_length\n",
    "    \n",
    "    if iter % config.print_epoch == 0:\n",
    "        print(\"loss is: %.4f\" % (print_loss))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第六步：随机采样，对模型进行测试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sampling(encoder, decoder):\n",
    "    # 测试模式\n",
    "    encoder.eval()\n",
    "    decoder.eval()\n",
    "    \n",
    "    # 随机选择一个句子\n",
    "    pair = random.choice(pairs)\n",
    "    print('>', pair[0])\n",
    "    print('=', pair[1])\n",
    "    # 扔进模型中，进行翻译\n",
    "    input_variable = variableFromSentence(input_lang, pair[0], config.use_gpu)\n",
    "    input_length = input_variable.size()[0]\n",
    "    encoder_hidden = encoder.initHidden(config.use_gpu)\n",
    "    encoder_output, encoder_hidden = encoder(input_variable, encoder_hidden)\n",
    "    \n",
    "    decoder_input = Variable(torch.LongTensor([[SOS_token]]))\n",
    "    decoder_input = decoder_input.cuda() if config.use_gpu else decoder_input\n",
    "    decoder_hidden = encoder_hidden\n",
    "    \n",
    "    decoded_words = []\n",
    "    \n",
    "    for di in range(config.MAX_Len):\n",
    "        decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_hidden, encoder_output)\n",
    "        topv, topi = decoder_output.data.topk(1)\n",
    "        ni = topi[0][0]\n",
    "        if ni == EOS_token:\n",
    "            decoded_words.append('<EOS>')\n",
    "            break\n",
    "        else:\n",
    "            decoded_words.append(output_lang.index2word[ni])\n",
    "        # 把当前的输出当做输入\n",
    "        decoder_input = Variable(torch.LongTensor([ni]))\n",
    "        decoder_input = decoder_input.cuda() if config.use_gpu else decoder_input\n",
    "        \n",
    "    # 对 decoded_words 进行连接，输出结果\n",
    "    output_sentence = ' '.join(decoded_words)\n",
    "    print('<', output_sentence)\n",
    "    print('')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> she felt insecure about her future .\n",
      "= 她對她的未來感到沒有安全感。\n",
      "< 她 對 於 抱 感 到 感 到 。 <EOS>\n",
      "\n",
      "> he even called me an idiot .\n",
      "= 他甚至骂我傻瓜。\n",
      "< 他 叫 我 来 了 湯 姆 。 <EOS>\n",
      "\n",
      "> he doesn t have any friends .\n",
      "= 他没有任何朋友。\n",
      "< 他 没 有 任 何 朋 友 。 <EOS>\n",
      "\n",
      "> i am a good boy .\n",
      "= 我是一个好男孩。\n",
      "< 我 是 个 好 男 人 。 <EOS>\n",
      "\n",
      "> i saw tom almost every day last week .\n",
      "= 我上周幾乎天天見到湯姆。\n",
      "< 我 幾 乎 天 上 周 幾 乎 天 見 過 。 <EOS>\n",
      "\n",
      "> i remember meeting you before .\n",
      "= 我記得以前見到你。\n",
      "< 我 記 得 你 和 我 看 起 。 <EOS>\n",
      "\n",
      "> she kept on working .\n",
      "= 她繼續工作。\n",
      "< 她 繼 續 工 作 。 <EOS>\n",
      "\n",
      "> your tripod is in my office .\n",
      "= 你的三腳架在我的辦公室裏。\n",
      "< 我 的 辦 現 金 之 前 我 的 辦 公 室 上 的 車\n",
      "\n",
      "> they lived a happy life .\n",
      "= 他们生活得很幸福。\n",
      "< 他 一 個 朋 友 住 在 一 個 朋 友 。 <EOS>\n",
      "\n",
      "> she seemed to have been ill .\n",
      "= 好像她病了。\n",
      "< 她 有 两 个 唱 。 <EOS>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i in range(10):\n",
    "    sampling(encoder, attention_decoder)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "结语：总的来说还是蛮智障的，但是看上去的效果确实比单纯的encoder-decoder好点，我们还可以有另外的评价指标来对这个结果进行评判，这里不做具体的说明了，感兴趣的朋友，可以看看 [BLEU](https://www.aclweb.org/anthology/P02-1040.pdf) （Bilingual Evaluation Understudy）。。\n",
    "\n",
    "另外，我们对于英语的 embedding 可以采用别人预训练好的词向量，也可以提升翻译效果。\n",
    "\n",
    "还可以尝试，使用多层GRU， 寻找更多的训练数据。\n",
    "\n",
    "另外，我们还可以尝试使用别的训练集去单独训练 autoencoder，保证 encoder 的效果足够好之后，再使用 decoder 来完成剩下的翻译工作。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
